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Abstract 

As an extension of prior work, we study inspecific Hebbian learning using the classical Oja model. We use 
a combination of analytical tools and numerical simulations to investigate how the effects of inspecificity 
(or synaptic "cross-talk") depend on the input statistics. We investigated a variety of patterns that 
appear in dimensions higher than 2 (and classified them based on covariance type and input bias). The 
effects of inspecihcity on the learning outcome were found to depend very strongly on the nature of the 
input, and in some cases were very dramatic, making unlikely the existence of a generic neural algorithm 
to correct learning inaccuracy due to cross-talk. We discuss the possibility that sophisticated learning, 
such as presumably occurs in the neocortex, is enabled as much by special proofreading machinery for 
enhancing specificity, as by special algorithms. 

Keywords: Hebbian learning, cross-talk, biased input statistics, negative correlation, spectrum, n-dimensional 
dynamics, bifurcation. 



1 Introduction 

In this paper, we revisit some fundamental questions in computational neuroscience, related to unsupervised 
learning in cortical networks. We use a simple model of learning (previously studied by the author) to 
study how learning occurs when the model incorporates transmission inspecificity ( "synaptic errors" ) . We 
focus in particular on a few crucial questions: To what extent and under which circumstances can synaptic 
inspecificity facilitate or prevent learning? Are certain input distributions more easily learned than others, 
or more affected by inspecificity? Can a small level of cross-talk induce significant changes (bifurcations) 
in the system's asymptotic dynamics? 

The paper is organized as follows. In the introduction, we present the model (which we will call 
throughout the paper the "inspecific Oja model") and we overview the basics of its dynamic behavior 
(Section 1.1). In Section 2 we start by investigating numerically how a 3-dimensional Oja inspecific network 
processes different classes of input distributions, preserving some of the dynamical aspects found in the 
2-dimensional phase plane [12], but also introducing new features, specific to higher dimensions. In Section 
3, we study analytically, in an n-dimensional example, the behavior observed numerically in the previous 
section. The Section 4 interprets the numerical and analytical results in the biological context of a learning 
cortical network. 
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1.1 The inspecific Oja model 

Oja [11] showed that a simple neuronal model could perform unsupervised learning based on Hebbian 
synaptic weight updates incorporating an implicit "multiplicative" weight normalization, to prevent un- 
limited weight growth [10]. 

Our focus is on studying a single-output network, learning an input distribution according to Oja's 
rule [11]. More precisely, the output neuron receives, through a set of n input neurons, n signals x = 
(xi, x n ) T drawn from an input distribution "P(x), x G K n , transmitted via synaptic connections of 
strengths u = {oj\, ...,u; n ) T . The resulting scalar output y is generated as the weighted sum of the inputs 
y = x t cj. The synaptic weights lj{ are modified by implementing first a Hebb-like strengthening pro- 
portional to the product of Xj and y , followed by an approximate "normalization" step, maintaining the 
Euclidean norm of the weight vector close to one. The input covariance matrix C = x T x can be used as an 
appropriate long-term characterization of the inputs, to study the expected long-term convergence of the 
weight vector (i.e., learning), by approximating it with the asymptotic behavior of w(t) = {uj(t + l)|u;(i)) 
in the equation [13]: 

= 7 [Cw — (w T Cw) w] 

Oja [ ] showed that this simple model acts, when applicable, as a principal component analyzer for the 
input distribution. Finding principal components could be very useful in the brain for data compression 
and transmission, since for Gaussian data such representations have statistically optimal properties, and 
often neural signals are approximately Gaussian. 

Recent data suggest [9, 4, 5] that weight updates may be affected by each other, for example due to 
unavoidable residual second messenger diffusion between closely spaced synapses. In our recent work we 
examined how such "crosstalk" would affect the Oja model [13]. We formalized learning inspecificity via 
an error matrix E € M n (M) that has positive entries, is symmetric and equal to the identity matrix 
n (M) in case the error is zero. Consistently with our previous studies in both two [12] and higher 
dimensions [ ] , we consider the error matrix of the form 



E 



q e ••• e 
e q ••• e 



(1) 



where < e < — is the "transmission error" and — < q < 1 is the "transmission quality," satisfying q + 

n n 
(n — l)e = 1. The inspecific learning equations become: 

~ = 7 [ECw - (w T Cw)w] (2) 

We have noted previously that an equilibrium for this system is any vector w = (w\...w n ) T such that 
ECw = (w r Cw)w, i.e., an eigenvector of EC (with corresponding eigenvalue A w ), normalized, with 
respect to the norm ||-||c = y (•, -)c (defined as (v,u)c = v r Cu, for all u,v € M n ), so that ||w||c = A w . 
Generically, EC has a strictly positive, unique maximal eigenvalue, and the corresponding eigendirection 
is orthogonal in ((-, -))c to all other eigenvectors of EC. 

We have also shown that the eigenvalues of the Jacobian matrix at an equilibrium w are given by 
— 27 A w and — 7 [A W — A Vj ], where A w and A Vj , Vj = l,n — 1 are the n eigenvalues of EC (noting first that 
B w = {w, vi, ...v n _i}, the completion of w to a basis of eigenvectors of EC, orthogonal with respect to 
the dot product (-, -)c, also forms an eigenvector basis for the Jacobian). We concluded that, if EC has a 
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unique largest eigenvalue, then a normalized eigenvector w is a local hyperbolic attracting equilibrium for 
(2) iff it corresponds to the maximal eigenvalue of EC. If EC has a multiple largest eigenvalue, the system 
will exhibit a set of nonisolated, neutrally attracting equilibria (all normalized eigenvectors spanning the 
principal eigenspace, in this case of dimension > 2). Some of the computational details are summarized 
in Appendix 1 (e.g., a description of the attraction basins, supporting the absence of cycles in the phase 
space) and further expanded in our previous work [13, 12]. 

Since the nature and position of the equilibria depends on the spectral properties of EC, the next 
task is, naturally, to study the spectral changes of EC when perturbing the system by increasing the 
transmission inspecificity. In our previous work on the model, we found that the effects of perturbations on 
the system's dynamics can depend very strongly on the characteristics of the input distribution (correlation 
sign, degree of bias). In our first study we only considered learning of positively correlated n-dimensional 
input distributions, and found a smooth degrading of the learning outcome with increasing error, but no 
sudden changes in dynamics [13]. In our second study, we discovered that negatively correlated inputs can 
induce a bifurcation (stability swap of equilibria, through a critical stage) when increasing the error, even 
in as simple as a two-dimensional system; this bifurcation only occurred, however, in the case of unbiased 
inputs [12]. 

Here, we want to extend this work and investigate the effects of cross-talk in higher dimensional 
networks, when learning a variety of classes of input distributions, both biased and unbiased. More 
precisely, we will consider as potential covariance matrices all combinations of the form: 



v + 8% dbc • • • ±c 
±c v + 62 • • • ±c 

±c ±c • • • v + S„ 



(3) 



where we can assume without loss of generality that 6\ > 62 > ■ ■ • > 5 n > 0. For any k < n, we will say 
that the input has bias loss of order k if Si = . . . = S)-- We hypothesize that, even though the background 
covariance ±c is taken for simplicity to be uniform in absolute value, the inspecific learning rule will lead 
to interesting dynamics, in particular when the inputs exhibit some degree of cross-correlation. 

Since our analysis will focus on symmetric matrices C with possible off-diagonal elements, we have to 
first ask whether / when such a matrix can constitute the covariance matrix of a distribution of n-dimensional 
vectors. While establishing equivalent conditions may be difficult even for small dimensions [14], a simple 
sufficient criterion valid for any dimension is diagonal dominance. It is known that a symmetric diago- 
nally dominant matrix with real, non-negative diagonal entries is positive semi-definite, hence implicitly a 
covariance matrix, from the finite-dimensional case of the spectral theorem 1 . If we are willing to impose 
v + 5 n > (n — l)|c| as a (biologically plausible) upper bound on how large the input cross-correlations |c| 
can be with respect to the auto-correlations v, diagonal dominance clearly follows, and C is thus automat- 
ically guaranteed to be a covariance matrix. An interesting direction would be to interpret biologically the 
significance of an n-dimensional input distribution with negative correlations [7]; this question is, however, 
beyond the scope of this paper. 



2 Classes of inputs and bias effects on 3-dimensional dynamics 

We study here how input patterns affect the effects of inspecificity in driving the dynamics of a 3- 
dimensional network - the lowest dimension for which the question applies, and which captures the essence 
of this behavior even in higher-dimensional systems. In this section, we will inspect all combinatorial 
possibilities of cross-correlation sign and auto-correlation bias, and determine the effect of increasing error 

1 If X is an n x 1 column vector- valued random variable whose covariance matrix is the n x n identity matrix. Then 
cov (VCX) = i/C cov(X) VC = C. 
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on the dynamics in each case. In the next section, we will support with some rigorous proofs the main 
results obtained here through numerical simulations (we used the Matlab software, version 7.2.1). 

2.1 Input covariance patterns 

We studied separately all combinatorial possibilities for the input statistics, with uniform absolute value 
covariance; in other words, we considered covariance matrices of the form: 



v + 5\ ±c ±c 
±C V + 62 ±c 

±C ±C V 



(4) 



where, as before, v > 2|c|, and 8\ > 82 > (i.e., allowing bias of any order). 

Let's first note that, based on the number of negative upper-diagonal entries of C, we distinguish 4 
combinatorial classes: (A) all positive covariance (one configuration), (B) one negative entry (3 configura- 
tions), (C) two negative entries (3 configurations), (D) and all negative entries (one configuration). We will 
study the spectra of the inspecific matrices EC, and the differences that occur in these when considering 
different classes of C, as well as different degrees of bias: from fully biased (8\ > 82 > 0) to partly biased 
(Si = 82 > 0) to fully unbiased (81 = 82 = 0). In this section, we illustrate the behavior of the eigenvalues 
of EC as the quality q is changing in the interval (1/3, 1] (representing quality higher than error). 



For fully biased inputs (81 > 82 > 0), the behavior is indistinguishable between classes 2 : the largest 
eigenvalue remains separated from the second largest for the whole range of q (as shown for one example 
in Figure la), determining the eigenvector to gradually drift from the direction of the principal component 
of C (blue curve in Figures la). For any value of q, the system has two hyperbolically attracting equilibria 
(the normalized principal eigenvectors of EC, whose basins are separated by an invariant plane). In Figure 
2 we show the evolution of a set of trajectories, to illustrate convergence to the two attractors in the phase 
space, as well the dynamics within the separating plane. (We will encounter similar behavior for other 
classes of inputs as well, for which we will refer to the same Figure, since the same phase space evolution 
remains a qualitatively accurate depiction.) 

For loss of bias of order one (81 = 82 > 0), we distinguish three possible types of behavior. 

Separated leading eigenvalues (behavior "typical" to class A, as illustrated in Figure lb). This corre- 
sponds to a slow depreciation of the learned vector as q decreases (blue curve in Figure lb). The phase 
space behavior resembles qualitatively that for unbiased inputs, illustrated in Figure 2. 

Crossing of leading eigenvalues (behavior "typical" to class D, as illustrated in Figure lc and fur- 
ther discussed in Section 3), which produces a sudden swap of the attractors from one eigendirection to 
another, orthogonal, one (phenomenon we have described previously in a 2-dimensional model [12]). This 
corresponds to a crash in learning at a critical value of the quality q* (which depends on parameters as 
Q* = ^+fzf)- Low inspecificity (q > q*) has in fact no effect on learning in this case: although the leading 
eigenvalue changes, the principal direction remains the same, so the system will converge generically to the 
same outcome as in the absence of error. This may seem like a very desirable input distribution to learn 
in the presence of inpecificity; however, one has to keep in mind that, if the cross-correlations are small in 
absolute value |c|, then q* will be very close to 1. Such perfect learning will therefore only happen when 
inspecificity is almost insignificantly small. The more disturbing this becomes, when we recall that at the 

2 Since the spectra depend qualitatively on all parameter values, we present here the results of a numerical investigation, 
rather than an rigorous analytical study, which would be extremely cumbersome. In contrast, we will later prefer an analytical 
approach to the classification in fully unbiased case, where the computations become more tractable. 
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Spectral changes with error for fully biased inputs 



Spectra, I changes wil.li error For part ly-binsod inputs (separate eigenvalues) 




Quality value q 

Spectral changes with error for partly-biased inputs (eigenvalue swap) 



Quality value q 

Spectral changes with error for partly-biased inputs (avoided crossing) 





Quality value q 

Spectral changes with error for fully unbiased inputs (class A) 



Quality value q 

Spectral changes with error for Fully unbiased inputs (class C) 




Quality value q 



Figure 1: Spectral changes induced by increasing inspecificity, for various inputs schemes. In all panels 
we show, with respect to the quality q = 1 — 2e: the evolution of the eigenvalues, with color-coding black for largest 
eigenvalue, red for the second largest and green for the lowest (top subplot); the angle between the inspecific stable 
vector and the correct attracting direction(s) (bottom subplot). In all panels, v — 1, |c| = 0.2. The classification is as 
follows: A. For fully biased inputs (8% = 2, 82 = 1), the three eigenvalues remain separated. For partly-biased inputs 
(8\ =§2 = 1), there are three cases, depending on the number of negative cross- correlations and on their placement: 
the leading eigenvalues can remain separated (B), they can cross at a critical values of q = q* (C), or they can 
approach significantly for some value of q, but "avoid" crossing (D). For fully unbiased inputs, we found four cases, 
classified simply by the number of negative off-diagonal cross- correlations (and not by their geometry): all positive 
cross-correlations — leading eigenvalues remain separated (E ); one negative cross-correlation — leading eigenvalues 
only coincide at q — 1, and immediately separate (F ); two negative cross- correlations — leading eigenvalues may 
approach each other, in an avoided crossing of magnitude depending on parameters, but remain separated (G); all 
negative cross-correlations — leading eigenvalues coincide on a whole interval, as quality depreciates from q = 1 to a 
critical value. In this case, the system has a curve of half-neutral attractors, which persists until q reaches the critical 
value, when a different, orthogonal, eigenvector takes over as stable direction. 
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Figure 2: Phase space trajectories for fully biased inputs. A. In the absence of error, the system converges 
generically to the two normalized vectors in the principal direction wc of the covariance matrix C . The attraction 
basins are separated by the subspace (w,wc) = (the shaded plane). B. For error e = 0.2, the system converges 
generically to the two normalized vectors in the principal direction wec of the modified covariance matrix EC. The 
attraction basins are separated by the subspace (w, wec) — (the shaded plane). Parameters used: v = 1, c = 0.2, 
8i = 2, #2 = 1. Color coding: trajectories evolve in time from darker towards lighter shades. 

end of the "good" interval lies the bifurcation, crashing the equilibria to a completely irrelevant direction; 
so any fault of the system in the direction of slightly miscalculating the limits for the permissible error, 
would have dire consequences. If the network does not have an additional, good estimator of its degree 
of inspecificity, it may not only learn an irrelevant outcome, but also have no knowledge of it. In Figure 
3, we represent three phase space plots: before, at and after the bifurcation point. While Figures 3a and 
3c illustrate the typical phase space with two hyperbolically stable equilibria (one representing accurate, 
error-free learning, the other - inaccurate learning for a post-critical error), the bifurcation phase space 
is qualitatively different: the system has no hyperbolic attractors, but rather a closed curve (ellipse) of 
half-stable equilibria (neutral along the direction of the curve) . Clearly, the outcome of learning is in this 
case extremely dependent on the initial conditions (although, as commented in previous work [12], the 
stochastic version of the system will rather have noise-driven stationary solutions that drift around this 
attracting ellipse). If this phase-plane dynamics were specific only to this bifurcation, one may find it 
justified to overlook its occurrence in the context of generic dynamics. However, this is not the case; as 
shown below, there are classes of inputs for which such a attracting-ellipse slice represents the natural state 
of the system, and persists for an entire inspecificity range. 

"Avoided crossing" of the eigenvalues (a hybrid behavior observed in mixed cases from classes B and 
C, in which the eigenvalues approach, without actually crossing, at a value q = q* , which depends on all 
parameter values). While the principal eigenvectors never swap in this case, the learning has a significantly 
rapid depreciation around q* (see blue curve in Figure Id). 

For bias loss of order two, the computations are much simplified by the absence of bias, so we can carry 
out analytically a complete classification. The result is presented concisely in Theorem 2.1, explained in 
more detail in the proof of the theorem, then further interpreted for the remainder of the section. 

Theorem 2.1. For order two input bias 61 = 62 = 0, the dynamic behavior of the system is classified by 
the classification of the input covariance sign, A-D. 

Proof. For order two input bias, all classes A-D can be generated from three symbolic structures: 
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Figure 3: Bifurcation in attractor dynamics for partly biased inputs, all negative cross-correlations. 

A. For small error, the attractors (the two normalized principal eigenvectors of EC) don't differ too much from 
the correct attractors (the two normalized principal eigenvectors of EC). The attraction basins are separated by the 
subspace (w, Wc) = (the shaded plane) . B. For critical error e = v +$_ c , the system exhibits an ellipse of neutrally- 
stable equilibria (yellow curve contained in the shaded plane) . C. For error past the critical value, the attractors have 
moved significantly far from the correct positions. Parameters used: v = 1, c = 0.2, 5 = Si = 82 = 1. Color coding: 
trajectories evolve in time from darker towards lighter shades. 
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Class A represents Structure Ci with c > 0, and Class D, represents Structure Ci with c < 0. Class B 
can be obtained from Structures C2 and C3 with c > 0, while Class C can be obtained from Structures 
C2 and C3 with c < 0. 

Computing directly the spectrum for Ci, we get one simple eigenvalue £1 = v + 2c (whose eigenvector 

is also error-independent) and one double eigenvalue £2 = (1 — 3e)(u — c). If c > (Class A), £1 always 

dominates (Figure le). If c < (Class D), the double eigenvalue £2 = (1 — 3e)(v — c) takes over for error 

— c 

smaller than the critical value e < (Figure lh). 

v — c 

Also by direct computation, one notices that bfCi and C 2 have the same spectral decomposition. One 
eigenvalue is given by £1 = (1 — 3e)(v + c), while the other two £2 > £3 are the roots of the quadratic 
polynomial P(X) = X 2 + (c — 2v — 5ec + 3ev)X + (6ec 2 — cv — 3ev 2 — 2c 2 + v 2 + 3ecv). It is easy to see that 
= — 8ec(l — 3e)(v + c). If c > (Class B), then P(£i) < 0, hence £2 > £1, with equality at e = 0, and 
£1 > £3, with equality when e = 1/3 (Figure lg). If c < (Class C), then P(£i) > and £1 < (£ 2 + 60/2, 
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hence £1 < £2 < £31 with equality when e = and e = 1/3 (Figure If). □ 

The theorem allows us some immediate class-specific interpretations in the context of phase space dynamics 
and learning. 




-1 



Figure 4: Bifurcation in attractor dynamics for partly biased inputs, all negative cross-correlations. 

A. For small error, the system has an ellipse of neutrally- stable equilibria (yellow curve). This ellipse is stable, in 
the sense that it persists for a whole interval of errors, from e — until e — . B. For error past the critical value, 
the ellipse is destroyed, but the new attractors are significantly far from the plane of the ellipse. Parameters used: 
v = X, c = 0.2, 81 = 2 = 8% = 0. Color coding: trajectories evolve in time from darker towards lighter shades. 

Class A. The leading eigenvalue is constant, and always separated from the second (double) eigenvalue. 
Moreover, the principal component of EC does not change when the error increases, so in this case the 
learning is fully accurate for any degree of inspecificity (Figure le). This is a class of input statistics which 
is completely error-proof. 

Class B. This falls within the typical case of separated leading eigenvalues, where the system learns, for 
any error value, the leading eigendirection of EC (which degrades smoothly from the principal component 
of C; see Figure lg, and Figure 2). Depending on parameters, the eigenvalue curves with respect to q 
may exhibit a significant point of minimal separation (see "avoided crossing"), where the learning outcome 
(leading eigenvector of EC) deteriorates very fast. 

Class C. In the error-free case, the matrix C has a double leading eigenvalue, and the system has a 
whole closed curve of neutrally attracting equilibria (in the eigenplane spanned by the corresponding 
eigenvectors). When error is introduced, the two leading eigenvalues segregate, and one of the eigenvectors 
takes over, which determines an immediate complete switch in the learning outcome. In this case, even 
the smallest degree of inspecificity leads to favoring one specific direction, slightly detaching off the plane 
where the "real" equilibria are contained (notice that the cosine of the accuracy angle, represented by the 
blue curve in Figure If, does not fall too far off the perfect value cos(#) = 1). We may interpret this as the 
error helping the system "make up its mind" in the presence of too much ambiguity in the input statistics. 

Class D. In the error-free case, the matrix C has a double leading eigenvalue, and the system has again 
a whole curve of neutral equilibria, contained in the corresponding eigenplane. When subject to errors up 
to a critical value q* = the leading eigenvalues change, but remain equal; furthermore, the subspace 
spanned by the two corresponding eigenvectors remains unchanged, hence the learning process retains the 
original ambiguity. Past the critical error value, the eigenvalues swap, and the eigendirection of the new 
leading eigenvalue (of multiplicity one) is orthogonal to the previous plane (Figure lh). In other words, 
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past the critical error value, the system will learn, but with such low accuracy, that the result of learning 
is useless. 



3 An analytical application in higher dimensions 

We will work out an analytical computation which suggests that the cases described in Section 2 can 
be extended to classify the behavior of the higher dimensional inspecific Oja system. For simplicity, we 
consider only one application: for negatively cross-correlated inputs: 



v + 5\ 
—c 



—c 
V + 62 



v + 5 V 



(5) 



Although this is perhaps the least biologically sound case, we feel that it is mathematically the most inter- 
esting, and describes the opposite scenario from the case of all positively cross-correlated inputs (which is 
mathematically the least interesting). As suggested by the numerical computations in 3 dimensions, covari- 
ance matrices that exhibit other positive/negative patterns of cross-correlations are expected to produce 
hybrid dynamics between these two extreme ends. These dynamics will depend not only on the number 
of negative correlations, but also on their distribution within the covariance matrix. A random matrix 
analysis may be able to classify behavior for all input patterns, but this is not within the scope of this 
study. In this section, we only present the main analytical results we obtained for our application; proofs 
of the statements and additional comments can be found in Appendix 2. 

Fully biased case. We first consider the covariance biases <5j's to be distinct: 5\ > 82 > ■ ■ ■ > <5 n ,-i > 
5 n = 0. We want to study the eigenvalues and eigenspaces of the modified covariance matrix EC. The 
characteristic polynomial of EC can be expressed as (see details in Appendix 2): 



A (A) = det(EC - AI) 



fi X 2 (X) 



fn 
fn 



fi h ■■■ X n (X) 

where for all j = 1, n, we called fj = e(v + 5j — c) + c and Xj(X) = q(v + 5j — c) + c — A. 

We consider Xj = (q — e)(v + Sj — c); clearly: Ai > A2 > ••• > A„. In Appendix 2, we show how 
these values can be used to partition the real line and separate the roots of A. In particular, we prove the 
following: 

Proposition 3.1. In the biased case 8\ > 62 > ■ ■ ■ > S n , the matrix EC has n real distinct eigenvalues 

£i>6 >•••>£«• 

We can define, as in the 2 and 3-dimensional applications, the "critical" error values, for which fj(e*j) = 
0, Vj e T^n: 



V + 8n 



(6) 



so that < e* < e\ < . . . < e* (since S± > 62 > ■ ■ ■ > S n ). Clearly, for all j G l,n, we have fj > iff 
e > e*-. As e increases from to 1/n, it traverses the values e = e^. When e is in the intervals between two 
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Figure 5: A simple example of how the characteristic polynomial A of EC and its roots change as the 
quality q decreases, for dimension n = 3 and fixed parameters v — 1, c — —0.2, 5j — j/3, so that e\ ~ 0.091, 
e 2 ~ 0.107, £3 ~ 0.130, for j G 1,3. Each different color represents a different values of q: q = 0.98 (red), q = 0.805 
(blue), q = 0.76 (green), q — 0.6 (pink). The continuous curves correspond to the graph of the polynomial for 
different g's, and the bullets represent (along the x-axis) the points Xj = (q — e)(v + Sj — c), for j = 1, 3. The figure 
shows how the order of the position of the roots of A changes with respect to the points of the partition A3 < A2 < Ai 
(which in turn travel down the axis as q decreases). For q = 0.98 (i.e., e = 0.01 < e|), Ai > £1 > A 2 > £2 > A3 > £3. 
For q = 0.805 (where e = 0.0975 € [ei,e|]), £1 > Ai > A 2 > £ 2 > A 3 > £3. For q = 0.76 (where e = 0.12 € [e|,e|]), 
£1 > Ai > £ 2 > A 2 > A3 > £3. For g = 0.6 (where e = 0.2 > e|), £1 > Ai > £ 2 > A 2 > £3 > A3. 



consecutive critical values e*, each two consecutive roots of A are separated by at least one Xj. When e 
reaches each critical value e,-, the root £j crosses from one interval to another through the stage £, = A^. 

Losing the bias. Suppose now that, for j G l,n, <5j = + Q, and allow some of the Q —> 0; in the 
limit, this results in a loss of bias in the covariance matrix C (v + Sj = v + for some index j). In 
consequence, Aj — Ai+i — > 0. It follows that in the limit of C = and £ = Ai = A2, so that the maximal 
eigenvalue of EC preserves its multiplicity =1. This situation changes if we introduce an order two bias 
loss 61 = 82 = S3 (i.e. if we make both £1 and £2 approach zero simultaneously). Then Ai — A2 — > and 
A2 — A3 — > 0, so that, the two leading roots collide into a double root A3 = £2 = A2 = £1 = Ai. This justifies 
the following proposition: 

Proposition 3.2. Suppose e < e*. An order k bias loss of the covariance matrix C of the type 8± = . . . = 6^ 
results in a leading eigenvalue of multiplicity k — 1 for the modified covariance matrix EC. 

4 Discussion 

In this study, we considered a learning network based on the classical unsupervised learning model of 
Oja, extended it to allow synaptic cross-talk (encoded either as inspecificity e, or as transmission quality 
q = 1 — (n — l)e) and we showed how different input patters can exacerbate, or at the contrary, efface the 
effects of this cross-talk on the asymptotic outcome of learning. 

We made a few simplifying assumptions: we considered uniform magnitude of input cross-correlations 
(i.e, uniform absolute value |c| of the off-diagonal elements of C), and uniform error (the Hebbian adjust- 
ment of any weight was equally affected by error, and did not depend either on the strength of that weight 
or its identity). Such "isotropicity" seems like a reasonable basic assumption, and has been further moti- 
vated and discussed in our previous work [13, 12]. Furthermore, it allowed us to identify other features of 
the input distribution, crucially consequential on the learning dynamics and outcome: the cross-correlation 
signs, and the input bias. 
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When observing the (qualitative and quantitative) effects that the presence of cross-talk can have on 
the system's asymptotic behavior, we noted that these can vary substantially, depending on the input 
second-order statistics. We found that, in specific highly unbiased cases, the cross-talk has no effect on 
the presence and position of the asymptotic attractors (Figure le). In other cases, the depreciation of the 
asymptotic outcome with error is so slow, that small errors have virtually no effect on learning (Figure 
la,b; also see Figure 2 for a phase space illustration). 

Other significant classes of inputs, however, exhibited a sudden change of the attracting direction from 
an almost perfect input principal component estimator to a direction almost orthogonal to the original. This 
occurred either in the form of an eigenvalue swapping bifurcation in dynamics (producing the instantaneous 
loss of learning accuracy at a critical error value; see Figure lc, and also 3 for an illustration of phase space 
transitions), or in the milder form of an eigenvalue "avoided crossing," (inducing a smooth, yet very steep 
depreciation of the learned direction at a specific error, see Figure ld,g). As discussed in our previous 
work, these two latter effects can be practically undistinguishable: learning works reasonably well for small 
enough errors; for errors past the crash value, the outcome becomes irrelevant to the input statistics, and 
the system is essentially encoding information on the cross-talk pattern itself. 

Finally, we found that in instances of highly unbiased inputs, learning may lead to an ambiguous 
outcome (double leading eigenvalue), even in the absence of cross-talk (Figures lf,h and 4). This is an 
occurrence we have not encountered in our previous, more restrictive, versions of the model, since it 
requires inputs with concomitant negative cross-correlations and loss of bias of order > 2). Our current 
analysis shows that the fashion in which the cross-talk handles input ambiguity (i.e., nonisolated, neutrally- 
attracting equilibria) depends quite significantly on the number and (in this case also) geometry of the 
negative correlations within the input. Depending on these, we distinguished two cases. One in which even 
the smallest degree of cross-talk helps the system make an asymptotic selection for one particular direction 
in the eigenspace spanned by the multiple eigenvalue. The other, in which no small degree of inspecificity 
can perturb this "stable ambiguity". The level of critical cross-talk that can finally destroy the curve of 
neutrally-stable equilibria also pushes the system to learn an orthogonal direction, hence irrelevant to the 
main features of the original input statistics. 

While this extension still only considers a very simple model of learning, it helps us re-iterate an impor- 
tant idea, which we have formulated before [13, 6, 8, 12]). A central problem for biological learning seems 
to be that the activity-dependent processes that lead to connection strength adjustments cannot be com- 
pletely synapse specific [6, L]. This raises the possibility that sophisticated learning, such as presumably 
occurs in the neocortex, is enabled as much by special machinery for enhancing specificity, as by special 
algorithms [ ]. It seems therefore possible that a key biological factor in learning problems is not just in 
finding good architectures, techniques and estimative algorithms, but also in perfecting the relevant plas- 
ticity apparatus. We have suggested that learning plasticity errors are analogous to mutations, and that 
cortical circuitry might reduce such errors in the same manner as "proofreading" reduces DNA copying 
mistakes. This further suggests that problems of survival and reproduction are so diverse that no single 
algorithm can solve them all, so that no "universal" or "canonical" cortical circuit should be expected. 
However, if every specialized algorithm relies on extraordinarily specific synaptic weight adjustment, then 
finding machinery that allows such specificity would indeed be equivalent to discovering new neurobio- 
logical general principles. We have speculated that an important part of such machinery, at least in the 
neocortex, might lie outside the synapse itself, in the form of complex circuitry performing a proofreading 
operation analogous to that procuring accuracy for polynucleotide copying [1, 3, 2]. Let us note that such 
machinery would be less necessary if update inaccuracy merely degraded learning, rather than preventing 
it (possibility which our model does not exclude). Even so, when temporarily unfavorable input statistics 
lead to imperfect learning because of Hebbian inspecificity, the degraded weights might still be a useful 
starting point for better learning when input statistics improve. 
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Appendix 1 



The symmetric, positive definite matrix C 6 .M n (M) defines a dot product in M. n as: 

(v, w)c = v r Cw 

Although both C and E are symmetric, the product EC is not symmetric in the Euclidean metric. 
However, in a new metric defined by the dot product (•, -}c, EC is symmetric. (Indeed, for any pair of 
vectors u, v 6 M n , we have 

(ECu,v) c = (ECu)*Cv = u'C*E*Cv = u*CECv = (u,ECv) c 

In consequence, EC has a basis of eigenvectors, orthogonal with respect to the dot product (•, -)c- 
The following theorem, describing the equilibria of the system (2), is immediate. 

Theorem 1. An equilibrium for the system is any vector w = (wi...w n ) T such that ECw = (w r Cw)w, 
i.e., an eigenvector o/EC (with corresponding eigenvalue A W/ ), normalized, w.r.t. the norm \\-\\c = (v)c> 
so that ||w||c = A w . 

ECw = A w w, ||w||c = A w 

If we additionally assume (generically) that EC has strictly positive maximal eigenvalue of multiplicity 
one, then the corresponding eigendirection is orthogonal in (-,-)c to all other eigenvectors of EC. 

Take then w to be an equilibrium of the system (2), i.e. an eigenvector of EC, with eigenvalue A w = 
(w T Cw)w > 0. To establish stability, we calculate the Jacobian matrix at w to be 

Df% = 7 [EC — 2w(Cw) T - (w T Cw)l] 

Then we get the following: 

Theorem 2. Suppose EC has a multiplicity one largest eigenvalue. An equilibrium w (i.e., by theorem 
(), an eigenvector o/EC with eigenvalue A w , normalized so that \\w\\c = A Wy ) is a local hyperbolic attractor 
for (2) iff it is an eigenvector corresponding to the maximal eigenvalue o/EC. 

Proof. Fix an eigenvector w of EC, with ECw = A w w. Then: 



Dfw w = -27A w w 

Recall that the vector w can be completed to a basis B of eigenvectors, orthogonal with respect to the dot 
product (•, -)c- Let v 6 6, v / w, be any other arbitrary vector in this basis, so that ECv = A v v, and 
(w, v)c = w'Cv = 0. We calculate: 



£>/w v = -7[^w - A v ]v 

So B is also a basis of eigenvectors for Df^. The corresponding eigenvalues are — 27 A w (for the eigenvector 
w) and — 7[A W — A v ] (for any other eigenvector v 6 B, , v 7^ w). An equivalent condition for w to be 
a hyperbolic attractor for the system (2) is that all the eigenvalues of Df^j are < 0. Since 7, A w > 0, 
this condition is further equivalent to having — 7(A W — A v )| < , for all v £ B , v 7^ w. In conclusion, an 
equilibrium w is a hyperbolic attractor if and only if A w > A v , for all v 7^ w (i.e. A w is the maximal 
eigenvalue, or in other words if w is in the direction of the principal eigenvector of EC). 
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□ 

Such attractors always exist provided that the condition of Theorem 2 is met (i.e., EC has a maximal 
eigenvalue of multiplicity one). Then the network learns, depending on its initial state, one of the two 
stable equilibria, which are the two (opposite) maximal eigenvectors of the modified input distribution, 
normalized so that ||w||c = A w . Next, we aim to show that these two attractors are the system's only 
hyperbolic attractors. 

Theorem 3. Suppose the the modified covariance matrix EC has a unique maximal eigenvalue X\. Then 
the two eigenvectors ±wec corresponding to Ai, normalized such that ||w||c = Ai, are the only two at- 
tractors of the system. More precisely, the phase space is divided into two basins of attraction, o/wgc and 
— wec respectively, separated by the subspace (w, wec) = 0- 



Proof. We make the change of variable u = v Cw. The system then becomes: 

u = Au - (u'u)u (7) 

where A = vCEvC symmetric matrix, having the same eigenvalues as EC. More precisely, w is an 
eigenvector of EC with eigenvalue \i iff y/Cv is an eigenvector of A with eigenvalue /j,; hence any two 
distinct eigenvectors of A are orthogonal in the regular Euclidean dot product. 

Consider then v to be the leading eigenvector of A, and let u = u(i) be a trajectory of the system (7). We 
want to observe the evolution in time of the angle between the variable vector u and the fixed vector v, 
measured as: 

COS 6 = - ; 



||v|| • I u| 

We differentiate and obtain: 



-||v||Bin W ^ (V '" )l "" i :- || ( 3 V ' U)(U ' U> 

ll u ll 

The numerator of this expression is 

h(u) = (u*u)(v*Au) - (v*u)(u*Au) (8) 

We are interested in the sign of h(u); to make our computations simpler, we can diagonalize A in a basis of 
orthogonal eigenvectors A = P*DP, where D is the diagonal matrix of eigenvalues and P is an orthogonal 
matrix whose columns are the eigenvectors. Then: 

/ i (u) = (z t z)(y'Dz)-(y*z)(z'Dz) 

where y = Pv and z = Pu, so that Dy = DPv = Aiy (where Ai is the largest eigenvalue of EC, assumed 
to have multiplicity one). Then: 



h(u) = (y t z)J2(X 1 -X j )z 



Hence, if y'z > 0, then h(u) > 0. In other words: if v*u > then — ||v|| sin(#)# > 0, hence that 6 < 0. For 
our original system, this means that any trajectory starting at a w with (w, wec) > converges in time 
towards the principal eigenvector wec of the matrix EC. □ 
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Appendix 2 



We want a concise description of the modified input matrix EC. To begin with, we can express the matrices 
E and C individually as: E = eM + (q - e)I and C = cM + (v — c)I + ^^SjAj, where I is the n x n 
identity matrix, M is the n x n matrix with uniform unit entries, and, for any j = l,n, Aj is the matrix 
with zero entries except Aj(j,j) = 1. Note, for future computations, that M 2 = uM and that MAj is 
the matrix with the only nonzero entries being ones along the j-th column. Unless otherwise specified, the 
summations are for j = l,n. The product EC will then be 

EC = [e(v-c) + c(q-e) + ecn]M + (q-e)(v-c)I + e^25 j MA j + (q-e)^2d j A j 
In matrix form, this translates as 



EC 



*i(A) fa 
fi X 2 (X) 



h fa ■■■ x n W 

where, Vj = 1, n, we called fj = e(v + Sj — c) + c and X,(A) = q{y + 5j — c) + c — A. 
Fully biased case 

We first consider the covariance biases <5j's to be distinct: 5± > 82 > ■ ■ ■ > 8 n -i > S n = 0. We will prove 
that the polynomial A has n real roots £1 > £2 > • • • > £n> and we will find approximating bounds for 
their positions on the real line. 

Remark first that the end behavior of A(A) is given by: lim A(A) = 00 and lim A(A) = (— l) n oo 

Consider Aj = (q — e)(v + Sj — c); clearly: Ai > A2 > ••• > A n . We will use these values to partition the 
real line and separate the roots of A. To begin, we calculate, for all i,j = l,n: 



Xi(Xj) = fi + (v - e)(Si - Sj 



(9) 



In particular: Xj(Xj) = fj, Vj = l,n. By raw and column manipulations, it can be shown that, Vj = l,n 



A(A,) = f^q-er^H^-Sj) (10) 
In consequence: sign(A(Aj)) = sign(/j)(— l)™ - - 7 . 

Recall that fj = e(v + Sj — c) + c, hence /i > fa > . . . > f n . To continue our discussion and establish the 
signs of A at all partition points Aj, we need to establish the index j for which the values fj switch sign. 
For each j £ 1, n, consider the "critical" error values, for which fj(e*j) = 0, Vj G 1, n: 

e* 

so that < e* < e% < . . . < e* (since <5i > 5 2 > 



(11) 

...> S n ). Clearly, for all j G 1, n, we have fj > iff e > e*. 



v + Sj — c 
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Remark. A safe assumption that would allow us to study all cases that may appear is to consider 
v > (n — l)|c|, which guarantees e* < 1/n, Vj G l,n. This insures a complete discussion, since then 
e G [0, 1/n] is allowed to reach and cross over all the critical values e^, creating a possible swap in the order 
of the eigenvalues of EC, as we will show later. The proof for the other cases will be omitted, since it is 
just a simplification of the present argument. In fact, the only crossover of true interest to us is e = e\, 
where the eigenvalue swap involves the two largest eigenvalues and thus affects the position of the system's 
attracting equilibria, corresponding to the normalized eigenvectors of the maximal eigenvalue; the other 
critical values e = e*, for j > 2, only affect the stable/unstable spaces of the saddle equilibria. In this light, 
the condition on the entries of the covariance matrix can be loosened to v > (n — l)|c| — Si. 

We distinguish the following cases: 
(I.) For < e < e*. This implies fj < 0, Vj G LTn. Then 

sign(A(A,)) = sign(/,)(-l)^ = (-1)(-1)"^ = (-l)^ 1 (12) 

From (4), (4) and (12), we obtain the following sign table: 



A 


-co A„ 


A n -i 


. A 2 Ai +oo 


sign(A(A)) 


(+) (") 


(+) • 


. (-1) 71 " 1 (-1)" (-l) n 



From the Intermediate Value Theorem and the Fundamental Theorem of Algebra, it follows that the 
polynomial A(A) has n real roots £i > £2 > • • • > Cn, such that: 

- 00 < £ n < A„ < £ n _i < A n _i < . . . < A 2 < £1 < Ai < 00 (13) 
(II.) For e* < e < e* +1 . Then fi, . . . , f p > and f p +i, . . . , f n < 0. Similarly as in (I.), we have: 



A 


-00 X n 


A n -1 


A p+ i A p 


Ai +00 


sign(A(A)) 


(+) (") 


(+) • 


(— \) n ~P (— l) n_p 


. (-l) n_1 (-l) n 



hence the polynomial A(A)) has roots £1 > £2 > • • • > £n> such that: 

- 00 < i n < A n < in-i < A n _i < . . . < £ p+ i < A p+ i < A p < C P < ■ ■ ■ < Ai < £1 < 00 (14) 
(III.) For e* < e < 1/n. Then f%, . . . , f n > and we have 



A 


—00 


An A n _l 


. A 2 Ai +00 


sign(A(A)) 


(+) 


(+) (-) • 


. (-l) n (-l)"" 1 (-l) n 



and the polynomial A(A)) has roots £1 > £2 > • • • > £n, such that: 

- 00 < A„ < £ n < A„_i < £ n -i < • . . < Ai < £1 < 00 (15) 
In particular, we have proved the following lemma in the main text: 

Proposition 3.1. In the biased case Si > S2 > ■ ■ ■ > S n , the matrix EC has n real distinct eigenvalues 

£1 >6 >...>£„. 
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Losing the bias 

Suppose now that, for j £ 1, n, 5j = 5j + i + £j, and allow some of the Cj 0; in the limit, this results in a 
loss of bias in the covariance matrix C (v + 5j = v+Sj + 1 for some index j). In consequence, Xj — Xj+i — > 0. 

Let's study the changes of the maximal root £i as (j — > (i.e., we eliminate the bias between the 
two most correlated components of the matrix C. Suppose e 6 [0, e*]. This calculation can be extended 
similarly to the other intervals for e; however, we will only discuss here the case e G [Oj^i]) since it is the 
only one that relates directly to the position and multiplicity of the leading root of A. It also agrees with 
our goal to study the behavior of the system for small enough transmission errors. According to (13), we 
have 

-oo < £n < A n < £ n _i < A n _i < . . . < A 2 < 6 < Ai < oo 

Since Ai — > A2, it follows that in the limit of £ = and ^ = Ai = A2, so that the maximal eigenvalue of 
EC preserves its multiplicity =1. This situation changes if we introduce an order two bias loss 5\ = 62 = S3 
(i.e. if we make both (1 and (2 approach zero simultaneously). Then Ai — A2 — > and A2 — A3 — > 0, so 
that, the two leading roots collide into a double root A3 = £2 = A2 = £1 = Ai. This justifies the following 
proposition: 

Proposition 3.2. Suppose e < e\. An order k bias loss of the covariance matrix C of the type 8± = . . . = <5& 
results in a leading eigenvalue of multiplicity k — 1 for the modified covariance matrix EC. 

This proposition can be generalized to encompass bias loss anywhere in the inputs, and any interval 
for the error e. Below, we give a more general statement, which follows by repeating the argument for the 
case we already analyzed, but could also be proved more directly. 

Theorem. Suppose that the matrix C is allowed to exhibit bias loss in all possible ways, so that it can 

N 

be written in block form as (5), where there exist fei, fo, . . . , fcjy ^ 1j n > w hh kj = n and such that 

5i = . . . = 5 kl = v\ 
^fci+l = ■ • • = 8k 2 = v 2 

<W_i+i = • • • = ^k N = 

with 

v\ > l>2 > ■ ■ ■ > vn 

Then the characteristic polynomial A of EC has all real eigenvalues. More precisely, these eigenvalues 
are Xj = (q — e)(v + Sj — c) with multiplicity kj — 1, for all j G 1,N, and N additional eigenvalues 

6 > 6 > • • • > Cn- 

Remark. The order of these eigenvalues, depending on the the error value e with respect to the critical 

— c 

error values v* = , is the same as described in the cases (t.)-fLU.) above. 

3 V + Vj - c 
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